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Abstract 

In this paper, we consider the following inverse maintenance problem: given A S and a 

number of rounds r, at round k, we receive a n x n diagonal matrix and we wish to maintain 
an efficient linear system solver for A^D^^^A under the assumption does not change too 
rapidly. This inverse maintenance problem is the computational bottleneck in solving multiple 
optimization problems. We show how to solve this problem with O (nnz(A) + (i“) preprocessing 
time and amortized 0(nnz(A) + cP) time per round, improving upon previous running times. 

Consequently, we obtain the fastest known running times for solving multiple problems 
including, linear programming and computing a rounding of a polytope. In particular given 
a feasible point in a linear program with n variables, d constraints, and constraint matrix 
A G we show how to solve the linear program in time 0((nnz(A) + d^)'/d\og{€~^)). 

We achieve our results through a novel combination of classic numerical techniques of low rank 
update, preconditioning, and fast matrix multiplication as well as recent work on subspace 
embeddings and spectral sparsification that we hope will be of independent interest. 


1 Introduction 

Solving a sequence of linear systems is the computational bottleneck in many state-of-the-art opti¬ 
mization algorithms, including interior point methods for linear programming [12, 29, 30, 19], the 
Dikin walk for sampling a point in a polytope [9] , and multiplicative weight algorithms for grouped 
least squares problem [I], etc. In full generality, any particular iteration of these algorithms could 
require solving an arbitrary positive definite (PD) linear system. However, the PD matrices involved 
in these algorithms do not change too much between iterations and therefore the average cost per 
iteration can possibly be improved by maintaining an approximate inverse for the matrices involved. 

This insight has been leveraged extensively in the field of interior point methods for linear 
programming. Combining this insight with classic numerical machinery including preconditioning, 
low ranks update, and fast matrix multiplication has lead to multiple non-trivial improvements to 
the state-of-the art running time for linear programming [28, 29, 33, 19, 12]. Prior to our previous 
work [19], the fastest algorithm for solving a general linear program with d variables, n constraints, 
and constraint matrix A G i.e. solving min^-^gcr^x, depended intricately on the precise 

ratio of d, n, and nnz(A) the number of non-zero entries in A (See In Figure 1. 1). While, these 
running times were recently improved by our work in [19], the running time we achieved was still a 
complicated bound of 0{\/^{d’^ + nd‘^~^r~^ -b dr^)) and 0(\/d(nnz(A) -|- dP)Y 

to compute an e-approximate solution where fd G [^j 1] and r > 1 are free parameters to be tuned 
and u) < 2.3729 is the matrix multiplication constant [36, 7]. 

^In this paper we use O to hide factors polylogarithmic in d, n and the width U. See Sec 7 for the definition of U. 
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Figure 1.1: The figure shows the fastest algorithms for solving a linear program, min^^^gc^x, 
before [19]. The horizontal axis is the number of constraints n written as a function of d, the 
vertical access is the number of nonzero entries in A denoted by z. Each region of the diagram is 
the previous best running time for obtaining one bit of precision in solving a linear program with 
the given parameters. For more information on how these times were computed, see Appendix B. 


In this paper we further improve upon these results. We cast this computational bottleneck 
of solving a sequence of slowly changing linear systems as a self-contained problem we call the 
inverse maintenance problem and improve upon the previous best running times for solving this 
problem. This yields an improved running time of 0((nnz(A) -|- d'^)^/dlog{e~^)) for solving a linear 
program and improves upon the running time of various problems including multicommodity flow 
and computing rounding of an ellipse. We achieve our result by a novel application of recent 
numerical machinery such as subspace embeddings and spectral sparsification to classic techniques 
for solving this problem which we hope will be of independent interest. 

1.1 The Inverse Maintenance Problem 

The computational bottleneck we address in this paper is as follows. We are given a fixed matrix 
A G and an number of rounds r. In each round k G [r] we are given a non-negative vector 

g 'R.yQ and we wish to maintain access to an approximate inverse for the matrix (A^D^^^A)”^ 

where G is a diagonal matrix with = d[^\ We call this problem the inverse 

maintenance problem and wish to keep the cost of both maintaining the approximate inverse and 
applying it as low as possible under certain stability assumptions on 

For our applications we do not require that the approximate inverse to be stored explicitly. 
Rather, we simply need an algorithm to solve the linear system A^D^^^Ax = b efficiently for any 
input 5 G M'^. We formally define the requirements of this solver and define the inverse maintenance 
problem below. ^ 

Definition 1 (Linear System Solver). An algorithm S is a 7~-time solver of a PD matrix M G 
if for all 6 G and e G (0,1/2], the algorithm outputs a vector S(6, e) G in time 0(Tlog(e“^)) 
such that with high probability in n, ||s(6, e) — < e||M~^6||^. We call the algorithm S 

linear if S(6, e) = for some G that depends only on M and e. 

^In Appendix 6 we show that the problem of maintaining a linear solver and maintaining an approximate inverse 
are nearly equivalent. 
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Definition 2 (Inverse Maintenance Problem). We are given a matrix A G and a number of 

rounds r > 0. In each round k G [r] we receive 3-^^ G M”o wish to find a 7~-time solver for 
PD such that both T and the cost of constructing is small. (See Algorithm 1.) 


Algorithm 1: Inverse Maintenance Problem 

Input: Full rank matrix A G initial scaling 3 ^^ G M>q, the number of rounds r > 0. 

for Each round k G [r] do 

Input: Current point 3^^ G M”q 

Output: Solver for A^D^^^A where G is diagonal with 

end 


The inverse maintenance problem directly encompasses a computational bottleneck in many 
interior point methods for linear programming [17, 18, 30] as well as similar methods for sampling 
a point in a polytope and computing approximate John Ellipsoids. (See Section 7 and 8.) Without 
further assumptions, we would be forced to pay the cost of solving a general system, i.e. 0(nnz(A) + 
d‘^) [26, 21, 3] where w < 2.3729 is the matrix multiplication constant [36, 7]. However, in these 
algorithms the cannot change arbitrarily and we develop algorithms to exploit this property. 

We consider two different assumptions on how can change. The first assumption, we call 
the £2 stability assumption, and is met in most short-step interior point methods and many of our 
applications. Under this assumption, the multiplicative change from 3^^ to is bounded in £2 

norm, indicating that multiplicative changes to coordinates happen infrequently on average. The 
second assumption, we call the a stability assumption, is weaker and holds both for these algorithms 
as well as a new interior point method we provided recently in [19] to improve the running time 
for linear programming. Under this assumption, the multiplicative change from 3^'^ to 3^^^^ is 
bounded in a norm induced by the leverage scores, a common quantity used to measure 

the importance of rows of a matrix (See Section 2.3). Here, multiplicative changes to these important 
rows happen infrequently on average (See Section 4). 

Definition 3 {£2 Stability Assumption). The inverse maintenance problem satisfies the £2 stabil¬ 
ity assumption if for all k G [r] we have || log(fi^^^) — log(d ^^“^^)||2 < 0.1 and A'^D^*^)A ^ 
A^D('=)A ^ ,dA^D(°)A for ^ = poly(n). 

Definition 4 (cr Stability Assumption). We say that the inverse maintenance problem satisfies 
the a stability assumption if for each k G [r] we have ||log(t?^^) — log(c?^“^^)||< 0.1, 

II log(cZ^^^) — log((?^“^^)||^ < 0.1, and /^“^A^D^^^A ^ A'^D^^^A ^ /JA^D^'^^A for (5 = poly(n). 

Note that many inverse maintenance algorithms do not require the condition /3“^A^D^*^^A ^ 
A2^D(^)A ^ /3 A^D(o)A ; However, this is a mild technical assumption which holds for all applica¬ 
tions we are interested in and it allows our algorithms to achieve further running time improvements. 

Also note that the constant 0.1 in these definitions is arbitrary. If 3^^ changes more quickly 
than this, then in many cases we can simply add intermediate 3^'^ to make the conditions hold and 
only changes the number of rounds by a constant. 

Finally, note that the a stability assumption is strictly weaker the £2 stability assumption as 
Ci < 1 (See Section 2.3). We use £^ assumption mainly for comparison to previous works and as 
a warm up case for our paper. In this paper, our central goal is to provide efficient algorithms for 
solving the inverse maintenance under the weaker cr stability assumption. 
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Year 

Author 

Amortized Cost Per Iteration 

Best known linear system solver [26, 21, 3] 

0{nnz{A) + 

1984 

Karmarkar [12] 

0{nnz{A) -|- -g 

1989 

Nesterov and Nemirovskii [27, 28, Thm 8.4.1] 

Ql"^0.5^1.3729 _|_ ^0.1718^1.9216 _|_ ^^1.8987^ 

1989 

Vaidya [33] 

0{nnz{A) + d"^ + 7^0-8334^1.1441^ 

2014 

Lee and Sidford [17] 

0{nnz{A) + d^ + 77,0.8260^1.1340^ 

2015 

This paper 

0{nnz{A) + d^) 


Figure 1.2: Algorithms for solving the inverse maintenance problem under stability guarantee. 
To simplify comparison we let the amortized cost per iteration denote the worst of the average per 
iteration cost in maintaining the 7~-time solver and 7~. 


1.2 Previous Work 

Work on the inverse maintenance problem dates back to the dawn of interior point methods, a broad 
class of iterative methods for optimization. In 1984, in the first proof of an interior point method 
solving a linear program in polynomial time, Karmarkar [12] observed that his algorithm needed to 
solve the inverse maintenance problem under the £2 stability guarantee. Under this assnmption if 
one entry of changes by a constant then the rest barely change at all. Therefore the updates 
to are essentially low rank, i.e. on average they are well approximated by the addition 

of a rank 1 matrix. He noted that it sufficed to maintain an approximation (A^D^^^A)”^ and 
by nsing explicit formulas for rank 1 updates he achieved an average cost of for the 

inverse maintenance problem. This improved npon 0{nd‘^~^), the best running time for solving the 
necessary linear system known at that time. 

Nesterov and Nemirovskii [27] built on the ideas of Karmarkar. They showed that how to main¬ 
tain an even cruder approximation to (A^DA)~^ and nse it as a preconditioner for the conjugate 
gradient method to solve the necessary linear systems. By nsing the same rank 1 update formulas 
as Karmarkar and analyzing the quality of these preconditioned systems they were able to improve 
the running time for solving the inverse maintenance problem. 

Vaidya [33] noticed that instead of maintaining an approximation to (A^DA)~^ explicitly one 
can maintain the inverse implicitly and group the updates together into blocks. Using more general 
low-rank update formulas and fast matrix multiplication he achieved an improved cost of 0(nnz(A)-|- 
His analysis was tailored to the case where the matrix A is dense; his running 
time is 0{nd) which is essentially optimal when nnz(A) = nd. Focnsing on the sparse regime, we 
showed that these terms that were “lower order” in his analysis can be improved by a more careful 
application of matrix multiplication [19]. 

Note that, for a broad setting of parameters the previous fastest algorithm for solving the inverse 
maintenance problem was achieved by solving each linear system from scratch nsing fast regression 
algorithms [26, 21, 3] These algorithms all provide an 0(nnz(A) -|- d“’)-time solver for A^D^^^A 
directly by nsing varions forms of dimension rednction. The algorithm in [26] directly projected 
the matrix to a smaller matrix and the algorithms in [21, 3] each provide iterative algorithms for 
sampling rows from the matrix to produce a smaller matrix. 

Also note that we are unaware of a previons algorithm working directly with the a stability 
assnmption. Under this assnmption it is possible that many change by a mnltiplicative constant 
in a single round and therefore updates to A^D^^^A are no longer “low rank” on average, making 
low rank update techniques difficult to apply cheaply. This is not just an artifact of an overly weak 
assnmption; our linear programming algorithm in [19] had this same complication. In [19] rather 
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than reasoning about the a stability assumption we simply showed how to trade-off the £2 stability 
of the linear systems with the convergence of our algorithm to achieve the previous best running 
times. 

1.3 Our Approach 

In this paper we show how to solve the inverse maintenance under both the £2 stability assumption 
and the weaker a stability assumption such that the amortized maintenance cost per iteration 
0(nnz(A) -I- (P) and the solver runs in time 0((nnz(A) -|- d?) log(e“^)). 

To achieve our results, we show how to use low rank updates to maintain a spectral sparsifier for 
A'^dI^) A, that is a weighted subset of 0{d) rows of A that well approximate A^D^^^A. We use the 
well known fact that sampling the rows by leverage scores (see Section 2.3) provides precisely such 
a guarantee and show that we can decrease both the number of updates that need to be performed 
as well as the cost of those updates. 

There are two difficulties that need to be overcome in this approach. The first difficulty is 
achieving any running time that improves upon both the low rank update techniques and the 
dimension reduction techniques; even obtaining a running time of 0(nnz(A) for the current 

u and c > 0 for the inverse maintenance problem under the £2 stability assumption was not known. 
Simply performing rank 1 updates in each iteration is prohibitively expensive as the cost of such an 
update is 0{d?‘) and there would be such updates that need to be performed on average in 

each iteration for some c > 0. Furthermore, while Vaidya [33] overcame this problem by grouping 
the updates together and using fast matrix multiplication, his approach needed to preprocess the 
rows of A by exactly computing A)^^A. This takes time 0{ndP~^) and is prohibitively 

expensive for our purposes if n is much larger than d. 

To overcome this difficulty, we first compute an initial sparsifier of A^D^^^A using only 0{d) 
rows of A in time 0{rmz{A.)+d^) [26, 21, 3]. Having performed this preprocessing, we treat low rank 
updates to these original rows in the sparsifier and new rows separately. For the original rows we 
can perform the preprocessing as in Vaidya [33] and the techniques in [33, 17] to obtain the desired 
time bounds. For low rank updates to the new rows, we show how to use subspace embeddings 
[26] to approximate their contribution to sufficient accuracy without hurting the asymptotic running 
times. In Section 3 we show that this technique alone (even without use of sparsification), suffices to 
mildly improve the running time of solving the inverse maintenance assumption with the £2 stability 
assumption (but not to obtain the fastest running times in this paper). 

The second difficulty is how to use the a stability assumption to bound the number of updates to 
our sparsifier; under this weak assumption where there can be many low rank changes to A^D^^^ A. 
Here we simply show that the a stability assumption implies that the leverage scores do not change 

too quickly in the norm induced by leverage scores (See Section 4). Consequently, if we sample rows 

ik) 

by leverage scores and only re-sample when either d\ ^ or the leverage score changes sufficiently 
then we can show that the expected number of low rank updates to our sparsifier is small. This 
nearly yields the result, except that our algorithm will use its own output to compute the approx¬ 
imate leverage scores and the user of our algorithm might use the solvers to adversarially decide 
the next and this would break our analysis. In Section 5 we show how to make our solvers 
indistinguishable from a true solver plus fixed noise with high probability, alleviating the issue. 

Ultimately this yields amortized cost per iteration of 0(nnz(A) -|- d^) for the inverse mainte¬ 
nance problem. We remark that our algorithm is one of few instances we can think of where an 
algorithm needs to use both subspace embeddings [26] as well as iterative linear system solving and 
sampling techniques [3, 21] for different purposes to achieve a goal; for many applications they are 
interchangeable. 
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In Section 6 we also show that onr approach can be generalized to accommodate for mnltiple 
new rows to be added or removed from the matrix. We also show that onr inverse maintenance 
problem yields an 0((nnz(A) + d'^)y/dlog{e~^)) algorithm for solving a linear program improving 
npon onr previons work in [19] (See Section 7.1). In Section 7 we provide mnltiple applications of 
onr resnlts to more specific problems inclnding minimnm cost flow, mnlticommodity flow, non-linear 
regression, compnting a ronnding of a polytope, and convex optimization and conclnde with an open 
problem regarding the sampling of a random point in a polytope (Section 8). 

2 Preliminaries 

2.1 Notation 

Thronghont this paper, we nse vector notation, e.g x = (xi,... ,Xn), to denote a vector and bold, 
e.g. A, to denote a matrix. We nse nnz(x) or nnz(A) to denote the nnmber of nonzero entries in a 
vector or a matrix respectively. Freqnently, for x G we let X G denote diag(x), the diagonal 
matrix snch that Xjj = x,. For a symmetric matrix, M and vector x we let ||x||m '= \/ af^Mx. For 
vectors x and y we let ||y||^ = 

In most applications, we nse d to denote the smaller dimension of the problem, which can be 
either the nnmber of variables or the nnmber of constraints. For example, the linear programs 
min^n^x we solve in Theorem 21 has n variables and d constraints (and nse this formn- 
lation becanse it is more general than min^-^gc^x). 

2.2 Spectral Approximation 

For symmetric matrices N, M G we write N ^ M to denote that x^Nx < x^Mx for all 

X G M"' and we define N ^ M, N ^ M and N M analogonsly. We call a symmetric matrix M 
positive definite (PD) if M 0. We nse M «£ N to denote the condition that e“'^N ^ M ^ e'^N. 

2.3 Leverage Scores 

Onr algorithms make extensive nse of leverage scores, a common measnre of the importance of rows 
of a matrix. We denote the leverage scores of a matrix A G by a G M” and say the leverage 

score of row i G [n] is ai [A (A^A) ^ A’^jjj. For A G cTg M!(q, and D diag((f) we nse 

the shorthand (TA{d) to denote the leverage scores of the matrix D^/^A. We freqnently nse well 
known facts regarding leverage scores, snch as Uj G [0,1] and < d. (See [32, 24, 21, 3] for a 

more in-depth discnssion of leverage scores, their properties, and their many applications.) 

We nse the following two key resnlts regarding leverage scores. The first resnlt states that one 
can sample 0{d) rows of A according to their leverage score and obtain a spectral approximation 
of A^A [5]. In Lemma 5, we nse a variant of a random sampling lemma stated in [3]. The second. 
Lemma 6, is a generalization of a resnlt in [32] that has been proved in varions settings (see [17] 
for example) that states that given a solver for a A^A one can efficiently compnte approximate 
leverage scores for A. 

Lemma 5 (Leverage Score Sampling). Let A G and let u G M"" be an overestimate of a, 

the leverage scores of A; i.e. Ui > Oi for all i G [n]. Set pi = min |l, log n} for some 

absolute constant > 0 and e G (0,0.5] and let H G be a random diagonal matrix where 

independently Hjj = ^ with probability pi and is 0 otherwise. With high probability in n, we have 
nnz(H) = 0(||n||^e“^ logn) and A^HA A^A. 
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Lemma 6 (Computing Leverage Scores). Let A G let a denote the leverage scores of A, and 

let e > 0. If we have a T-time solver for A^A then in time 0((nnz(A) + T)e“^ log(e“^)) we can 
compute T G ML such that with high probability in n, (1 — e)ai < r* < (1 + e)(Ti for all i € [n]. 

2.4 Matrix Results 

Our algorithms combine the sampling techniques mentioned above with the following two results. 

Lemma 7 (Woodbury Matrix Identity). For any matrices A, U, C, V with compatible size, if A 
and C are invertible, then, (A + UCV)“^ = A’^ - A^^U (C"^ + VA-^U)”^ VA-b 

Theorem 8 (Theorem 9 in [26] ). There is a distribution F over 0{de~‘^) x n matrices such that for 
any A G with high probability in n over the choice ofYl V, we have A^H^IIA A^A. 

Sampling from F and computing HA for any 11 G supp(P) can be done in 0(nnz(A)) time. 

3 Solving the Inverse Maintenance Problem Using Stability 

In this section we provide an efficient algorithm to solve the inverse maintenance problem under 
the £2 stability assumption (See Section 1.1). The £2 stability assumption is stronger than the a 
stability assumption and the result we prove in this section is weaker than the one we prove under 
the a stability assumption in the next section (although still a mild improvement over many previous 
results). This section serves as a “warm-up” to the more complicated results in Section 4. 

The main goal of this section is to prove the following theorem regarding exactly maintaining 
an inverse under a sequence of low-rank updates. We use this result for our strongest results on 
solving the inverse maintenance problem under the a stability assumption in Section 4. 

Theorem 9 (Low Rank Inverse Maintenance). Suppose we are given a matrix A G a vector 

b^^^ G ® number of round r > 0, and in each round k G [r] we receive 3-^^ G such that 

g(fc) A'^D'^^^A is PD. Further, suppose that the number of pairs {i,k) such that d\^'^ 7 ^ d\^ 
is bounded by a < d and suppose that there is fd > 2 such that 1^1 ^ ^ Then, in 

round k, we can implicit construct a symmetric matrix K such that K (B^^)) such that we 

can apply K to an arbitrary vector in time 0(d^ log(/3)). Furthermore, the algorithm in total takes 
time 0{sdF~^ + ard (y)*^ + ra‘^) where s *= max{nnz(do), d}. 

This improves upon the previous best expected running times in [33, 17] which had an additive 
0{ndF~^) term which would be prohibitively expensive for our purposes. 

To motivate this theorem and our approach, in Section 3.1, we prove that Theorem 9 suffices 
to yield improved algorithms for the inverse maintenance problem under £2 stability. Then in 
Section 3.2 we prove Theorem 9 using a combination of classic machinery involving low rank update 
formulas and new machinery involving subspace embeddings [26]. 

3.1 Inverse Maintenance under I 2 Stability 

Here we show how Theorem 9 can be used to solve the inverse maintenance problem under the £2 
stability assumption. Note this algorithm is primarily intended to illustrate Theorem 9 and is a 
warm-up to the stronger result in Section 4. 

Theorem 10. Suppose that the inverse maintenance problem satisfies the £‘^ stability assump¬ 
tion. Then Algorithm 2 maintains a 0(nnz(A) -|- d?)-time solver with high probability in total time 

0{sd^~^ + r (^nd‘^~^) ) where s = max{maxfcg[^] nia.7,{d^^'), d} and r is the number of rounds. 
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Algorithm 2: Algorithm for the Stability Assumption 
Input: Initial point £ I^>o- 
Set := 

q(o) A^D(“P’’)A. 

Let be an approximate inverse of computed using Theorem 9. 

Output: A 0{(f + nnz(A))-time linear solver for A'^D^^^A (using Lemma 31 on 
for each round k £ [r] do 

Input: Current point £ M>o- 
for each coordinate i £ [n] do 

if < l.ldf^^^ is false then 

end 

Q(fc) a^d(“^''’)a. 

Let be an approximate inverse of computed using Theorem 9. 

Output: A 0(d^ + nnz(A))-time linear solver for A^D^^^^A (using Lemma 31 on 

end 


Proof. Recall that by the stability assumption || log(t?^^)—log(d^^“^^) II 2 <0.1 for all k. Therefore, 
in the r rounds of the inverse maintenance problem at most O(r^) coordinates of change by 
any hxed multiplicative constant. Consequently, the condition, < df^^ < in 

Algorithm 2 is false at most 0(r‘^) times during the course of the algorithm and at most O(r^) 
coordinates of the vector change during the course of the algorithm. 

Therefore, we can use Theorem 9 on A^D^“^’’^A with a = 0{r^) and s as dehned in the theorem 
statement. Consequently, the total cost of maintaining an implicit approximation of A) 

for r rounds is O [sd‘^~^ + dr‘^~^^ + 

To further improve the running time we restart the maintenance procedure every ^^^+1 

iterations. This yields a total cost of maintenance that is bounded by 


o{\ -^ 

which is the same as 

o{\ - 

Since a; < 1 + \/2, we have d{dP) 2^+1 < d‘^ and thus sd‘^~^ dominates d{sd^~^) 2^+1 when s = d. 
Furthermore, as s > d the term sd““^ grows faster than d{sd^~^) ^^^+ 1 . Consequently, sd^~^ always 
dominates and we have the desired result. □ 




3.2 Low Rank Inverse Maintenance 

Here we prove Theorem 9 and provide an efficient algorithm for maintaining the inverse of a matrix 
under a bounded number of low rank modihcations. The algorithm we present is heavily motivated 
by the work in [33] and the slight simplihcations in [17]. However, our algorithm improves upon 













the previous best cost of 0{nd^~^ + achieved in [17] by a novel use of subspace 

embeddings [26] that we hope will be of independent interest.^ 

We break our proof of Theorem 9 into several parts. First, we provide a simple technical lemma 
about maintaining the weighted product of two matrices using fast matrix multiplication. 

Lemma 11. Let A,B G and suppose that in each round k of r we receive G M"'. 

Suppose that the number of pairs {i,k) such that ^ xf^ or ^ is upper bounded by 

a. Suppose that nnz(x^^^) < a, nnz(^^^) < a and a < d. With 0{da‘^~^) time precomputation, we 
can compute explicitly in an average cost of 0{ad (y)^ ) per iteration. 

Proof. For the initial round, we compute X^^^AB^Y^*^) explicitly by multiplying an a x d and a 
d X a matrix. Since a < d, we can compute X^^^AB^Y^^^ by multiplying O matrices of size 
a X a. Using fast matrix multiplication this takes time 0{da‘^~^). 

For all k G [r] let = X(^) - X(^-i) and = Y(^) - Consequently, 

XWAB^Y^^') = AB^ ^ ^ 


Note that nnz(x^^^) and nnz are less than 2a. Thus, if we let Uk denote the number of 
coordinates i such that ^ x[^ or ^ yf^ then to compute X^*^^ AB^Y^*^) we need only 
multiply a UkXd with a dxu^ matrix and multiply two 2a xd matrices with dxu^ matrices. Since 
Uk < Oi, the running time is dominated by the time to compute the 2a x d and a d x Uk ■ Since 
Uk < a < d, we can do this by multiplying O matrices of size Uk x Uk- Using fast matrix 

multiplication, this takes time 0{adu‘f~‘^). 

Summing over all Uk we see that the total cost of computing the X^^^AB^Y^^^ is 


O 



where in the second inequality we used the concavity of x‘^ Since this is at least the 0{da‘^ 
cost of computing X^^^AB^Y^^^ we obtain the desired result. □ 


Next, for completeness we prove a slightly more explicit variant of a Lemma in [17]. 

Lemma 12. Let A G ..., G K^g, and B^^) A^D^^^A for all k. Suppose that the 

number of pairs {i, k) such that d[^'^ ^ dj^ is bounded by a and a < d. In 0{nd‘^~^ +adr (y)^ 
total time, we can explicitly output C G and a Y^^') G in each round k such that 


where is a nxn diagonal matrix with a|^^ = — d^l'^. Furthermore, is an 0{a) x 0{a) 

matrix if we discard the zero rows and columns. 

Proof Let B = B(°) for notational simplicity. Note that we can compute B directly in 0{ndP 
time using fast matrix multiplication. Then, using fast matrix multiplication we can then compute 
B~^ explicitly in 0{d^) time. Furthermore, we can compute C = AB~^ in 0{nd‘^~^) time. 

^We require a slightly stronger assumption than [17] to achieve our result; [17] did not require any bound on /3. 
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Let Uk be the number of non-zero entries in and be a x n matrix such that 

the = 1 if j is the index of the non-zero diagonal entries in and 0 otherwise. Let 

S(fe) = (p(fc)) (P(^))^. By definition of and p(^), we have 

A(fc) = ^ 

By the Woodbury matrix identity (Lemma 7), we know that 

_ g-l _ qT^{ k) ^p(fc)^^'p(fc)p{fc)^(A:)^^ 

where + p(*^)A(^)CA^A(*^)(p(^))'^)“^. Now by Lemma 11, we know that we 

can maintain A^^^CA'^A^^^ in an additional 0{adr (y)^ time, using that C = AB^^ was 
precomputed in 0{nd‘^~^) time. Having the matrix A^^^CA^A^^^ explicitly, one can compute 
p(^)A^^^CA'^A^^) (pl*^))"^ in 0{u\) time and hence compute in 0{u‘^) time using fast matrix 
multiplication. Hence, the total running time is 0{nd‘^~^ + adr (y)*^ ^ convexity of 

x‘^ yields < a‘^. Hence, the total time is bounded by 

0{nd^ ^ + adr + a‘^) = 0{nd'^ ^ + adr ). 

By setting V(^) = (pl^))^T(^)p(^), we have the desired formula. Note that is essentially 
with the zero columns and rows and hence we have the desired running time. □ 

We now everything we need to prove Theorem 9. 

Proof of Theorem 9. Let S C [n] denote the indices for which 3-^^ is nonzero. Furthermore, let 
us split each vector 3^^ into a vector for the coordinates in S and for the coordinates 
not in 5, i.e. f^^'> G M!(q such that 3^'> = 3^^ -I- f^^\ By the stability guarantee, we know 
/^-1 b(o) ^ p(fc) ^ ^b(°) for some /3. We make A'^E'^^^A invertible for all k by adding j^3^^ to 
all of c^^^and we will show this only increases the error slightly. 

Now, we can compute B and (B^^)) in 0{sd‘^ time using fast matrix multiplication 
where s = |5|. Furthermore, using Lemma 12 we can compute C and maintain such that 

in total time 0{sd‘^~^ + adr (y)^ ^). All that remains is to maintain the contribution from 

Using our representation of (A^E^-^^A) ^ and the Woodbury matrix identity (Lemma 7), we 
have 

= (^a'^e(^)a +a^f(^)a)”^ 

= (^A^E(^) A -F pik) 

= A^F(^) ( m ^ A (^A'^e(^) a) (3.1) 
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where 


mW = ^A^F(^) (3.2) 

Now note that 

F^A (^A^e(*^)a)~^ = F^'^^A (^A^e(^)a)~^ (^A^E^^'^a) (^A^E^^^a)"^ A^F^^) 

where 

N(fc) A (^A'^e(*^)a)~^ 

= ^(^D(°)y'^^+^(^F^^^y^^ - (^D(°)y^y ^ a a^f^*^^ 

Note that computing the A (B^®^) ^ A^F^^^ term directly would be prohibitively ex¬ 
pensive. Instead, we compute a spectral approximation to and show that suffices. 

Since is a rank a matrix, we use Theorem 8 to 11 G such that 

-Ni^) (3.3) 

for all k with high probability. Now, we instead consider the cost of maintaining 

To see the cost of maintaining we separate the matrix as follows: 

= n A (3.4) 

+n ^(^E(^)y''^ - (^D(°)y^y a ( b ^ y ~^ 

-n ac'^A(^) A^^^^CA^F^'") 

For the hrst term in (3.4), note that (D^®))^^^ A is a s x d matrix and hence we can precompute 
(D(o))^/^ A (B(o))“^ in 0{sd^ ^) time and therefore precompute 11 A (B^^)) ^ in the 

same amount of time. Note that 11 is a 0(a} x d matrix, so, we can write 

n (D(°)y^^ A (^b(°))~^ a^f(*=) = aka^f(*=) 

where K is some explicitly computed n x d matrix and A is a diagonal matrix with only 0{a) 
non-zeros. Consequently, we can use Lemma 11 to maintain n(D(*^))^/^A(B(‘^))“^A^F^^) in total 
time 0{sd‘^~^ + adr (^)^ ^). 
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For the second term in (3.4), we can precompnte A (B^*^)) ^ in 0{sd^ time. Therefore, nsing 
Lemma 11 we can maintain A (B^^)) ^ in total time O (“)‘^ 

and by Theorem 8 we can maintain 11 A ^ in the same 

amonnt of time. 

For the last two terms in (3.4), we nse Lemma 11 on 11 (D*^°)) AC^A^^^ and 

^(E(^))^^^ — AC’^A*^^) all in total time O i^adr {f)‘^ Note that all of those matrices 

inclnding are essentially 0(a) x 0(a) matrices if we discard the zero rows and colnmns. So, 
having those matrices explicitly compnted, we can compnte the last two terms in an additional of 
0(a‘^) time per iteration. 

Finally compnting (N^^^)^ only reqnires an additional 0(a‘^) time per-iteration. 

Hence, the total cost of maintaining (N^^^)'^ is 

O ^adr ^ +sd‘^~^+ra‘^ 



Using (3.1) and (3.2), we have shown how to approximate (B^*^)) Now, we show how to 
compnte a better approximation. Recall from (3.1) that 

Using Lemma 12 as we have argned, we can apply the hrst term (A^E^^^A) ^ exactly to a vector 
in O(d^) time. The only difhcnlty in applying the second term to a vector comes from the term 
(M(^))"\ Using (3.3), we have 

m(^) 

and hence 

Since compnting ^(F^^))^ + (N^*^))"^only reqnires (^(a^^) time and (3.2) shows that 
we can apply to a vector exactly in 0(d^) time, by Lemma 31 we have a symmetric matrix 

snch that we can apply to a vector in 0(d^ log(e“^)) for any e > 0. Hence, 

we obtain 

A^ (^a^e(*^)a)~^ a^f(*^) (^m(*^))~^f(^)a (^a^e(*=)a)~^ 

snch that we can apply A^^^ to a vector in 0(d^ log(e“^)) time. All that remains is to compnte 
what valne of e is needed for this to be a spectral approximation to the (B^*^^) 

Recall that by the assnmption A B^^^ A /3B^*^). As we mentioned in the beginning, we 

replaced with and compnte a constant spectral approximation to the new b(u 

that will snfhce for the theorem statement. Conseqnently 

A(fc) drf YTF^k)Y~^ - a 10/3 
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Furthermore, since > 0 we have 0 ^ and therefore 

- 10e,5 < A(^) ^ A(^) + lOe/? . 

Using the assumption ^ ^ again, we have 

A(*^) - lOe/?^ (b^Y~^ ^ ^ . 

Picking e = 0{^^), we have a matrix (A^E^^^A) ^ — A^^^ ~o(i) ( B^^)) ^ which we can apply in 

0(d^log(/3)) time. Furthermore, again since ^ ^ /3B(^) we see that our replacement 

of d with + j^d^*^)only affected our approximation quality by a constant. □ 


4 An algorithm for the a Stable Case 

In this section, we provide our algorithm for solving the inverse maintenance problem under the a 
stability assumption. The central result of this section is the following: 

Theorem 13. Suppose that the inverse maintenanee problem satisfies the a stability assumption. 
Then Algorithm 3 maintains a 0{nnz{A)+cfi)-time solver with high probability in total time 0{d^ + 
r(nnz(A) + d^)) where r is the number of rounds. 

To prove this we first provide a technical lemma showing that leverage scores are stable in 
leverage score number assuming a stability (See Section 4.1). Using this lemma we then prove the 
Theorem 13 (See Section 4.2). This proof will assume that the randomness we use to maintain 
our solvers is independent from the output of our solvers. In Section 5 we show how to make this 
assumption hold. 


4.1 Leverage Scores are Stable under a Stability 


Here we show that leverage scores are stable in the leverage score norm assuming a stability. This 
technical lemma. Lemma 14, is crucial to showing that we do not need to perform too many low-rank 
updates on our sparsifier in our solution to the inverse maintenance problem under the a stability 
assumption. (See Section 1.3 for more intuition.) 

Lemma 14. For all A G and any veetors x,y GM^O sueh that ||ln(x) — ln(y)||^ < e, we 

have 

II IndA(T) - ln<TA(y)||-^(-) < e^ll Inf - lny|| . 


Proof. For t G [0,1], let InOt denote a straight line from Inf to Iny with 6q = x and 9i = y oi 
equivalently let 6t exp(lnx -|- f(lny — Inx))). Since || ln(f) — ln(y)||^ < e we have for all i G [n] 

<?A(f)i = ijVXA (A^XA)~^ A^VXU 

if \/0A (A^0A)"^ A^Veii 

= ^A{0i) (4.1) 


Consequently, for all f, we have Jensen’s inequality we have 


In 


C7A(f)-lnfA(d)L^(^) 


< 


^ln<TA(dt) ) dt 


< e" 


(TaU) 


^ln3A(e,; 


O'A(t't) 


dt (4.2) 
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Now, for all z G M”q let Jin 2 -(ln(TA( 2 )) denote the Jacobian matrix 
and suppose that for all z and u we have 


91nCTA(^i 


aiiiCTAl^i „ 

d In Zj 

ij 

dzj ^3 





(4.3) 


Then 

f 


^in5A(e,; 




dt= I \\3^^ff^{lnaA{9t)) [-^ln6t 


U ^ lux — Inw L 


(4.4) 


where again used 4.1 as well as the definition of Ot- All that remains is to prove (4.3). For this, 
we first note that in [17, Ver 1, Lemma 36] we showed that = (Sa('?) ~ M) where 

= (^\/ZA(A^ZA)-iA'^\/zy . Consequently Jiii 2 -(ln(jA('?)) = 5 ]a(-?) ^ (Sa(T) — M(z)). 
Now note that 




E (VZA(A^ZA)-^A^\/z) 


2 


\/ZA(A^ZA)-^ A"^\/ZIj, \/ZA(A'^ZA)-1A^\/Zlj ^ 


(^\/ZA(A^ZA)-iA^\/z) 

(d=A( 2 ))j • 


33 


Consequently, Sa(T) — M is a symmetric diagonally dominant matrix and therefore L;a(^) ^ 
Sa(-?) — M ^ 0 and 0 ^ SA(T)^/^Jin^(lncjA('?))5iA(-?)~^^^ ^ I- Using this fact, we have that for 
all z G M” Q and u G M"" 


|Jln2(lncrA(2:))n||^^^^^ - 


< m 




and this proves (4.3). Combining (4.2) and (4.4) yields the result. 


□ 


4.2 Algorithm for a Stability 

Here we prove Theorem 13. The full pseudocode for our algorithm, with the exception of how we 
compute leverage scores and maintain the inverse of AH^^^^A can be seen in Algorithm 3. 

First, we bound the number of coordinates H that will change during the algorithm. 

Lemma 15. Suppose that the ehanges of d and the error oeeurred in eomputing a is independent 
of our sampled matrix. Under the a stability guarantee, during first r iterations of Algorithm 3, the 
expected number of coordinates changes in over all k is 0 (r^ logd). 

Proof. Since our error in computing a is smaller than the re-sampling threshold on how much change 
of (T, the re-sampling process for the row happens only when either crA(d)i or di changes by more 
than a multiplicative constant. In order for re-sampling to affect A^HA, it must be the case that 
it is either currently in A^HA or about to be put in A^HA. However, since whenever re-sampling 
occurs the resampling probability has changed by at most a multiplicative constant, we have that 
both these events happen with probability 0(7 • cTA(d)i) using the independence between d and the 
approximate a. By union bound we have that the probability of sampling row i changing the matrix 
A'^HA is C»(cJA(d)ilog(d)). 
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Algorithm 3: Algorithm for a Stability Assumption 
Input: Initial point G I^> 0 ' 

Set ;= (|to) and 7 “= lOOOcs logd where Cg dehned in Lemma 5. 

Use Lemma 6 to hnd such that 

For each i G [re] : let '■= dij min{l ,7 • with probability min{l ,7 • 

and is set to 0 otherwise. 

q{ 0 ) a^h(°)A. 

Let be an approximate inverse of computed using Theorem 9. 

Output: A 0{d? + nnz(A))-time linear solver for A (using Theorem 10 on 

for each round k G [r] do 

Input: Current point G K>o- 

Use Lemma 6 and the solver from the previous round to hnd such that 

0.994“^") < 

for each coordinate i G [re] do 

if either 0.9cjj°^'^^ < or 0.9d\°^^^ < is violated then 

dfd) _ 

(old) (apr) 

-=^1 • 

:= dj-^V niin{l ,7 • with probability min{l ,7 • 

and is set to 0 otherwise. 

else 

I := 

end 

end 

Q(fc) d^f A^H(^)A. 

Let be an approximate inverse of computed using Theorem 9. 

Output: A linear 0{d? + nnz(A))-time solver for A (using Theorem 10 on 

end 


Observe that whenever we re-sampled the row, either (TA.{d)i or dj has changed by more 
than a multiplicative constant. Let ki be the last iteration we re-sampled the row and let 
k 2 be the current iteration. Then, we know that lncrA(d^^'’'^^)« — ln(TA((?^^)i = f^(l) or 


E k2-1 
k= 


ki 


lndT+^^ -lnd1*^^ 


> 0(1). Since there are only r steps, we have |/c 2 — fei| < r and hence 


^ (^lncrA(c?'^’^^^)j - IncJA(d^^^)j) = O or ^ (^lnd|^^^^ - Ind^^^^ =0 

k=ki k=ki 

Since crA{d)i does not change more by a constant between re-sample (by cr-stability assumption), 
we have 

k2—i 2 

^ (TA{d^^^)i (lnaA{(^’"^^'’)i - lno-A(d^''^)i) = 0 

k=ki 

Y^aA(<if%{lndf^'>-lndfy = f! 

k=ki 



VA(d> 2 )), 


or 
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In summary, re-sampling the row indicates that the sum of the a norm square of the changes 
of either Iucta or Ind is at least crA{d)i/r and with 0{aA{d)i logd) probability, the sampled matrix 
is changed. Since Ylk il < 0{r) and by Lemma 14 Ylk || — 

In (Ta(c?^^)||^ < 0(r) we have the desired result. □ 

We now everything we need to prove our main theorem assuming that the changes of d and the 
error occurred in computing a is independent of our sampled matrix. (In Section 5 we show how to 
drop this assumption.) 

Proof of Theorem 13. Note that by design in each iteration k G [r] we have that ~o .2 

X;{oW) 2 where 5] diag(cr). Thus, we see that the sample probability was chosen 

precisely so that we can apply Lemma 5. Hence, we have ~o.i A^D^^^A. Thus, the 

algorithm is correct, it simply remains to bound the running time. 

To maintain inverse of A^H^^'^A, we can simply apply Theorem 9. By Lemma 5, we know that 
with high probability nnz(H(^)) < 0{d). By Lemma 15 we know there are only O(r^) coordinate 
changes during the algorithm in expectation. Therefore, we can use Theorem 9 on A^H^^^A with 
a = O(r^) and s = 0{d). Hence, the average cost of maintain a linear 0(nnz(A) -|- d^)-time solver 
of (A'^H^^^A) is O (^ -|- dr‘^ + . Similar to Theorem 10 , we restart the algorithm every 

r = 2 ‘-'+i and see that the total cost of maintenance is 0{d‘^ + rd'^‘^+^). Since io < 1 \/2, 

we have the total maintenance cost is O (dP + rd^^ . 

Thus, all the remains is to bound the cost of computing However, since by the ^stability 
assumption || log(c4+i) — log(c4)||g^ < 0.1 we know that sso.i and thus Rio .2 

ATD(fc+i)A. Thus, using Lemma 6 we can compute for all /c > 2 in 0(nnz(A) -|- d? log/3) time 
within 0.01 multiplicative factor. Furthermore, using this same Lemma and fast matrix multiplica¬ 
tion, we can compute in 0(nnz(A) -|- d‘^) time; therefore, we have our result. 

Note that Lemma 15 assume that the changes of d and the error occurred in computing a is 
independent of our sampled matrix. Given any linear solver. Theorem 17 in Section 5 shows how to 
construct a solver that has same running time up to 0(1) factor and is statistically indistinguishable 
from the true solver (A^D^^^ A) and thus circumvent this issue; thereby completing the proof. □ 

5 Hiding Randomness in Linear System Solvers 

In many applications (see Section 7), the input to a particular round of the inverse maintenance 
problem depends on our output in the previous round. However, our solution to the inverse main¬ 
tenance problem is randomized and if the input to the inverse maintenance problem was chosen 
adversarially based on this randomness, this could break the analysis of our algorithm. Moreover, 
even within our solution to the inverse maintenance problem we needed to solve linear systems and 
if the output of these linear systems was adversarially correlated with our randomized computation 
our analysis would break (see Section 4) 

In this section, we show how to fix both these problems and hide the randomness we use to ap¬ 
proximately solve linear system. We provide a general transformation, NoisySolver in Algorithm 4, 
which turns a linear solver for A^A into a nonlinear solver for A^A that with high probability 
is indistinguishable from an exact solver for A^A plus a suitable Gaussian noise. The algorithm 
simply solves the desired linear system using the input solver and then add a suitable Gaussian 
noise. 
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Algorithm 4: NoisySolver( 6 , e) 

Input: a linear 7~-time solver S of A^A, vector b G M”, and accuracy e G (0,1/2). 
yi := S(5, ei) where ei = e (32n^) 

Let fj G M” be sampled randomly from the normal distribution with mean 0 and covariance I. 
y 2 := S(A^ 7 /, 62 ) where 62 = (l 2 dn®) 

Output: y = 2/1 + ay 2 where a = lll/illArA • 


We break the proof that this works into two parts. First, in Lemma 16 we show that NoisySolver, 
is in fact a solver and then in Theorem 17 we show that with high probability it is indistinguishable 
from an exact solver plus a Gaussian noise. 

Lemma 16. Let A G and let S be a linear T-time solver for A'^A. For all b G and 

e G (0,1/2), the algorithm NoisySolver( 6 , e) is a (nnz(A) +T)-time solver for A"^A. 

Proof By the definition of y and the inequality (a + 6 )^ < 2a^ + 26^, we have 

2 


y-{A^A)-^b 


A^A 


< 2 yi- A^A 


T 


\aTA 


To bound the first term, recall that by the definition of S 

||yi - (A^A) < ei|| (A^ A) 


^11 


+ 2 a h /2 


A^A- 


A^A- 


To bound the second term, we note that by the definition of ff, ||??||2 follows x^-distribution with n 
degrees of freedom. It is known that [16, Lem 1] for all t > 0, 

P (||f /||2 > n + 2Vni + 2t^ < exp(—t) . 

Hence, with high probability in n, ||f /||2 < 2n. Using this and the definition of S yields 


ImW^ATA - 2||y2-(A^A) " A^7 /||^j,^ + 2|| (A^A) ^ A.'^vWata 
< 2(l + e2)||(A'^A)“'A^r/||^,^ 


< 4||f/|| < 8n. 


where we used that €2 < 1 and || A (A'' A 
proof yields that 




< 1. Using that ei < 1 and applying a similar 


2 _ ^ l |-||2 / e 

“ “ 32n 


fA^A' 


-1 


A^A 


Consequently, with high probability in n. 


y-(A^A) H 




AT A 


< 2ei|| (A^A) 


+ 16a^n < e|| (A^A) ^ 6 | 


I AT A 


|2 

I AT A 


□ 


Theorem 17. Let IdealSolver( 6 ) = (A'^A) ^6 + c where c follows the normal distribution with 


ata 


. Then, the total varia- 


mean 0 and covariance (A'^A) ^ where j3 = (A^A) ^b 

tion distance between the outcome 0 /NoisySolver and IdealSolver is less than 1/n^. Therefore, 
any algorithm calls IdealSolver less than times cannot distinguish between NoisySolver and 
IdealSolver. 


17 





















Proof. Since S is linear, we have y 2 = for some matrix Since fj follows the normal 

distribution with mean 0 and covariance matrix I, we have 


E [Ml] = [fjf] AQl = Q,A^AQ 

Lemma 32 shows that (A^A) ^ and hence 


E [Ml] 


■^4^/e2 


(A^ A 


-1 


(5.1) 


Since y 2 is a linear transformation of ff, y 2 follows is given by the normal distribution with mean 0 
and covariance QejA^AQ^, i.e. y 2 G AA(0, Q^jA^AQ^). 

Now let, y, z be the output of NoisySolver(6) and IdealSolver(6) respectively; we know that 

y G AA (yi,a^Qe 2 A^AQ^J 
z G AA((A^A)"^5,/32 (A^A)“^) 


AT A 


. To bound y —z , Pinsker’s inequal- 


where a = lyfllyillATA and/3 = lyf (A^A) b 

ity shows that ||y ~ ^DxL{y\\z) and using an explicit formula for the KL divergence for 

the normal distribution we in turn have that ||y — -?||^^ is bounded by 


1 


tri 


A'^AQ.^ A'^AQ^^ ) + ||yi - (A'^A)"' ^|| " ^ + 


In 


det/32(A^A)~^ 
det a 2 Qe 2 A^AQ^ 


Hence, we need to prove A^AQ^ ps (A^A) ^ and yi ps (A^A) ^ i 

First, we first prove yi ~ (A'^A) ^ b: 


|yi-(A^A) '&||J-2 (ata) ^b 

64nei 


.T x\-^ 


AT A 


< 


e 


(Definition of Sand yi) 
(Definition of /?) 


Next, to prove Q!^Qe 2 A^AQ^ ss /3^ (A^A) , we note that by triangle inequality and the definition 

of S and yi we have 


Ja)-' 

Therefore by the definition of a and /3 we have 


(1-\/d) (A^A) "6 < IlyillATA < (1 +Vd) (A^A) ^b 


AT A 


(l-3ViT)<^<(l + 3Vir) 


Using (5.1) then yields that 

a^d^^A^AQtl /3^ (A^A)-^. 

Therefore, we have 

tr((^)VAq,A^Aqs) = tr((?)qA^A)‘/“q,A^Aq;{A^A)''^) 

< d + 8y/€2d + Sy/eid 
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and 


Indet ((2)^ (A^A)2= Q^A^AQJ (A^A)''=) 

din 

A^/e2d + A^Jeid. 

Combining inequalities above and using our choice of ei and 62 we have 
||2/-z||tv ^ ^Wl2v^d + 12^d + 64 ny < 1/n^. 


/ det/32(A^A) ^ \ 
ydet a^Q^^A^AQ^ j 


□ 


6 Row Insertion and Removal 

For some applications we need to solve the inverse maintenance problem when the matrix A might 
change between rounds. In particular some rows of A might be entirely removed or some new rows 
might be added. 

For example, in each iteration of cutting plane methods, a new constraint is added to the current 
polytope {Aai > 6}; each iteration of quasi newton methods, the current approximate Hessian is 
updated by a rank 1 matrix. If the matrix is updated only by a rank 1 matrix each iteration, the 
inverse can be updated efficiently using the Sherman-Morrison formula. However, for the general 
case, it is less obvious when the matrix A and the diagonal D can be changed by a high rank matrix. 

Here we formally define the more general set of assumptions under which we would like to solve 
the inverse maintenance problem and show how to solve this problem efficiently. 

Definition 18 {K Stability Assumption). The inverse maintenance problem satisfies the K stability 
assumption if 

1. A row of A is revealed to the algorithm at round k if and only if the corresponding entry in 
S-k) 

is non-zero. 

2. For each /c E [r], we have either 

(a) II log(#)) - log(#“^))||^^(^-(fc)j < 0.1 and || log(#)) - log(#“^))||^ < 0.1; or 

(b) The vectors 3-^'^ and differ in at most K coordinates. 

3. /3-iA^D(0)A ^ A^DWA ^ /JA^D^A for (5 = poly(n). 

Some of the previous algorithms for the inverse maintenance, such as [33], rely on the assumption 
that the entire matrix A is given explicitly. In particular these algorithms perform precomputation 
on the entirety of A that they use to decrease the amortize cost of later steps. Here we show that 
the Algorithm 3 we proposed does not have this drawback and can be easily modified to solve this 
version of inverse maintenance problem under the K stability assumption for a fairly large K. 

Theorem 19. Suppose that the inverse maintenance problem satisfies the K stability assumption 
for K < where uj is the matrix multiplication constant. Then there is a variant 

of Algorithm 3 that maintains a 0(nnz(A) + d?‘)-time solver with high probability in total time 
0{d^ + r(nnz(A) + df)) where r is the number of rounds. 
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Proof. From our solution to the problem under the £2 stability assumption, i.e. the proof of Theo¬ 
rem 9, we see that we do not need to know the entire matrix A to maintain an approximate inverse 
provided that the number of low rank updates we need to perform is small. Therefore, it suffices 
to show how to reduce the a stability assumption with row addition and removal to the low rank 
update problem mentioned in Theorem 9. 

Our previous reduction under the a stability assumption, i.e. Theorem 13, relied on two facts, 
( 1 ) that ~o(i) A'^D'^^^A for all k and that ( 2 ) the expected number of row changes is 

0{r‘^). The hrst condition, (1) was used to ensure that we could efficiently compute approximate 
leverage scores for A^D^^^A and the second condition ( 2 ) was used to ensure our invocation of 
Theorem 9 was efficient. In the remainder of this proof we show how both condition (1) and (2) 
can be achieved. 

First we show how to achieve condition (1) under changes to First note that if the change 
follows case (a) of the K stability of assumption then since || log(d^*^^) — log(t?^“^^)||^ < 0.1 clearly 
A 2 ^D(^-i)A Rio(i) A^D('=)A as before. On the other hand, for case (b) we claim that (1) can be 
achieved by adding intermediate steps into the problem as follows. First, we can assume 
or by splitting changes that corresponds to inserting or removing rows into two steps. 

If we do the insertion hrst, it is easy to prove that y^^A^D^^^A ^ A'^D(^)A ^ 7 A^D(o)A still 
holds for some 7 = 0(/3) after the splitting. For the insertion case, we consider the matrix 

H(a) = A"^ + a A. 

Since ^ 0, it is easy to see that H(a) ~o(i) H(2a). Since H(l) ~poiy(n) H(0), as 

we have just argued, we have A^ A ~poiy(n) H(0) and hence H(poly(l/n)) ~o(i) 

H(0). Therefore, one can split the insertion case into 0(log(n)) steps to ensure A^D(^“^^A ~o(i) 
A^dI^IA. The argument for removal is analogous. 

Next we show how to achieve condition (2). Again we consider the case where a round is updated 
by (a) and by (b) separately. For (a), the changes of d is small in a norm, we already know from 
our previous analysis of the cr stability assumption that it can only cause O(r^) many rows change. 
For case (b) we dehne a be the total number of row changes due to (b). From the dehnition of 
Algorithm 3, we see that there are two causes of resampling, namely, the change of d and the change 
of cf. By the assumption (b) we know that there are at most K rows changes in d and therefore, 
this contributes at most rK to a. For the change of a, from hrst half of the proof of Lemma 15, we 
know that the number of resampling that causes changes in h is bounded by 


max min | \n) —In ,l|. 

i 

Therefore, we have that 

a < r A -|- y ^ max cTi a,(dW))| 

min IIIn(Tj (t?^))-lnai(c?''+i))|,l}. 


( 6 . 1 ) 


i,k 


For any k, we dehne 


such that ^ ■ 


20 






Then, we have that 


, (7j 


a < rK + 0{l) EE| EE max I ai 

k i£Sk k i^Sk 

< rK + 0{l) EEl EEl 


k iGSk 

i^k 


k i^Sk 


Again, we can split the second case into insertion only and removal only. For the insertion case, we 
note that ai{dJ'^~^^'>) < for all i with Since X = X we 

have that 




< 4 




i such that 


*ip] 


We have the same bound for the removal case. Putting it into (6.1), we have 
< rK + O (1) ^ j{i such that — 0{rK). 


a 


This proves that the total expected number of row changes is bounded by 0{r^ + rK). Note that 
we use the fact that if there is an update triggered by case (a) after the update is triggered by case 
(b) then we account for it by case (b), i.e. the interactions between case (a) and (b) only increase 
the number of row changes by a constant. 

Now, we can apply Theorem 9 and restart every r = (K~'^ steps and get that the average cost 
of maintenance is 

O + (r^ + rK)d 

= d(d^ + ^ ^ 

= d(^d^ + + d^-‘^K^-^ + d2^(‘^-2) + d‘^^‘^-^'> 

Now using that a; < 1 + \/2 we have that 1 + (w — 2)io < 2 and 2u{u — 2) < 2 and therefore the 
average cost of maintenance is 


O (d^ + ^ 


Hence, as long as K < d^^ we have that the average cost is O(d^). 


□ 


Remark 20. Using u < 2.37287 [7], the above theorem shows how to solve inverse maintenance 
problem with d^-‘^568 j.Q.^yg addition and removal in amortized O(d^) time. 


21 













7 Applications 


In this section, we provide mnltiple applications of onr algorithm for solving the inverse maintenance 
problem nnder the I 2 and a stability assnmptions. First in Section 7.1 we show how to nse onr resnlts 
to solve a linear program. Then in Section 7.2, Section 7.4, and Section 7.5 we state the conseqnences 
of this linear programming algorithm for solving the non-linear regression, mnlticommodity flow, 
and minimnm cost flow respectively. In Section 7.3 we show how onr results lead to new efficient 
algorithms for computing a rounding of a polytope. 


7.1 Linear Programming 


Here we show how to use our solution to the inverse maintenance problem under the a stability 
assumption (See Section 4) to improve the running time of solving a linear program. Our main 
result is the following: 

Theorem 21. Let A G c,l,u ^ and 6 G for d < n and suppose x G M"" is an interior 

point of the polytope 

S = < X G M"' : Ax = h and U < Xi < Ui for all i G [n] > 


Let U = max 


U — l 




1l II 

HI ^ 


1 n — 


c] 

00 


11 00 ’ II 

iiooy 


Then, consider the linear program 


OPT= minc^x 

xS5 


(7.1) 


In time O yVd (nnz(A) -|- log{U/e)j, we can compute y such that ||Ay — < 

Xj < Ui and cFy < OPT + e where S is a diagonal matrix Su = min {ui — yi,yi — li). 

Proof. In [18, ArXiv v2, Thm 28], we showed how to solve linear program of the form (7.1) by 
solving a sequence of slowly changing linear systems A^D^^^Ax = where is a diagonal 
matrix corresponding to a weighted distance of Xk to the boundary of the polytope. In [18, ArXiv 
v2, Lem 32] we showed that this sequence of linear systems satisfied the a stability assumption. 
Furthermore, we showed that it suffices to solve these linear systems to l/poly(n) accuracy. Since, 
the algorithm consists of y/dlog{U/e) rounds of this algorithm plus additional 0(nnz(A) time 

per round we have the desired result. □ 


We remark that we can only output an almost feasible point but it is difficult to avoid because 
hnding any point x such that Ax = b takes 0{ndP~^) which is slower than our algorithm when 
d. Similarly, we have an algorithm for the dual of (7.1) as follows: 

Theorem 22. Let A G where d < n. Suppose we have an initial point x G M"" such that 

AJ'x = h and —1 < x* < 1. Then, we can find y such that 


b y + ||Ay-Fc]|^ < 


(7.2) 


in O ( yd (nnz(A) -|- d^) log {U/e)] time where U = max 


1—X 



Proof. It is same as Theorem 21 except we invoke [18, ArXiv v2, Thm 29]. 


□ 


Remark 23. The existence of the interior point in Theorem 22 certihes the linear program has 
bounded optimum value. Standard tricks can be used to avoid requiring such interior point but 
may yield a more complicated looking running time (see Appendix E of [17] for instance). 
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7.2 and Regression 

The regression problem involves finding a vector x that minimize ||Ax — cj| for some n x d 
matrix A and some vector c. Recently, there has been mnch research [25, 26, 21, 2, 3] on solving 
overdetermined problems (i.e. n ^ d) as these arises natnrally in applications involving large 
datasets. While there has been recent snccess on achieving algorithms whose rnnning time is nearly 
linear inpnt pins something polynomial in d, in the case that p ^ 2 these algorithms achieve a 
polynomial dependence on the desired accnracy e [4] . Here we show how to improve the dependence 
on e by paying a mnltiplicative \/d factor and improve npon previons algorithms in this regime. 

Corollary 24. Let A G c G M"', and p = 1 or p = oo. There is an algorithm to find x sueh 

that 

||Ax —c|| <min||An —c|| +e||c|| 

II lip — ^ II w lip II lip 

in O (^Vd (nnz(A) + log time. 

Proof. The case is the special case of Theorem 22 with 5 = 0 and an explicit initial point 0. 

For the i°° case, we consider the following linear program 


rnin 

X,t 



t + II Ax — c — til 


+ Ax 


c + tl| 


where 1 G M"' is the all ones vector. For all t > 0, we have 


a — t| + |a + t| 


2t + (a — t) 
2t 

_ 2t + (—t — a) 


ii a > t 

if — t < a < t . 
if a < —t 


(7.3) 


Letting dist(o, [—t,t]) denote the distance from a to the interval [—t,t] (and |a| + |t| if t < 0) we 
then have that |a — t| + |a + t| = dist{a, [—t, t]) + 2t and conseqnently the linear program (7.3) is 
eqnivalent to 

t 

min /(x, t) = - + J2 dist ([Ax - c]^ , [-t, t]) . 
i=l 

Note that when t > ||Ax — cj|^ we have f{x,t) = | and when t < ||Ax — cj|^ we have f{x,t) > 
jIIAx —cj|^. Conseqnently, the linear program (7.3) is eqnivalent to icx) regression. To solve (7.3), 
we rewrite it as follows 


Since 



(7.4) 


2n—- ^ 

we can nse 2 n^ d initial point for Theorem 22 and apply it to (7.3) to hnd x, t as desired. □ 

Remark 25. We wonder if it is possible to obtain fnrther rnnning time improvements for solving 
regression when p {1, 2, oo}. 
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7.3 0(d) Rounding Ellipsoid for Polytopes 


For any convex set K, we call an ellipsoid E is an a-rounding E C K G aE. It is known that every 
convex set in d dimension has a d-rounding ellipsoid and that such rounding have many applications. 
(See [14, 35]) For polytopes K = {x ^ : Ax > b}, the previous best algorithm for finding a 

(1 + e)d rounding takes time 0(n^'^e~^) [14] and 0(nd‘^e~^) [15]. Here we show how to compute an 
0{d) rounding in time 0(\/d(nnz(A) + d^) log(C/)), which is enough for many applications. Note 
that this is an at least 0(\/d) improvement over previous results and for the case A is sparse, this 
is an 0(d^'^) improvement. 

We split our proof into two parts. First we provide a technical lemma. Lemma 26, which shows 
conditions under which we a point in a polytope is the center of a suitable ellipse and in Theorem 27 
we show how to us this Lemma to compute the ellipse. 


Lemma 26. Let P = {x : Ax >b} be a polytope for A G 

w G and let p(x) ~ ^ 

suppose that we have 7 > 1 such that 
ellipsoid E = {h G^'^ : ||^||y 2 p(,^ ) ^ 1} satisfies 


l>nxd 


and d < n. Furthermore, let 


for all X G P. Now let x* = arg min^gRd p(x) and 
i 

(A/i)j/(Ax* — b)i < 7 ||^||y 2 p(,j ) for all h G Then the 


X -\—E C P C X + 'y\\w\\,E . 
7 ^ 

Proof For any z G ^E, by the assumptions, we have that 


(7.5) 


max 

iS[n] 


(Az)i 

(Ax* - b)i 




< 1 . 


Consequently, for all z G ^E we have A(x* + z) > b and therefore x* + C P. 

To prove the other side of (7.5) let x G P be arbitrary and let s Ax — b, s* Ax* — b, 
S diag(^, S* = diag(x*), and W '= diag(t(;). By the optimality conditions on x* we have 

AWS^^l = Vp(x*) = 0. 


Consequently, 

0 = l^WS;^ A (x* - x) = (s, - ^ = 0 (7.6) 

and therefore as L^WSF^s* = ||dl||^ we have 1^WS7^% = ||dl||^ and therefore 




i=l 


[s*l? 




-2sJSf^WSf^s + \\Sf^ 


-ld|2 

w 


||S* ll^lll 
< ||wsr^%|U|sr^s^[| — ||t(;[L 

— II ^ ‘*"11111 ^ 11 00 II 111 


= \\W\ 


U|s7(s--s.)|| 


— Il^lll ■ 7II® “ 


I V^p(x») 


(Expanding the quadratic) 

(Using l^WS7^.s=||rtJ||j^) 
(Cauchy Schwarz) 
(Using l^WS7^s=||r(J||j^) 
(Assumption) 


Now, V^p(x*) = A^S^ ^A and therefore ~ II® “ ®*llv2p(x )' Dividing both 

sides of the above equation by ||x — ^*||y 2 p(,j ) yields that ||x — ^*||y 2 p(,j ) < 7||^lli yielding the 
right hand side of (7.5) and completing the proof. □ 
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Theorem 27. Let A G for d < n and suppose we have an initial point xq G such that 

Axq > b. Then, we can find an ellipsoid E such that E <Z {x ^ : Ax >b}c lOOd • E in time 


0{\/d (nnz(A) + d^) log{U)) where U 


max < max 


A.x>b 


I Ax — 6|| 


1 


A.XQ — b 


Proof. Given any interior point xq snch that Axq > b in [17] we showed how to hnd a weight w 
snch that Urdll^ < 3d and for any h G we have 


max 

ie[n] 


[Ah]i 
[Ax* - b]i 


< 3[|h 


V2p(x*) 


(7.7) 


where p{x) = — ^ 11=1 
hnd X G snch that 




and X* = argmin^^gp(y). Furthermore, we showed how to 


^ [A(x-x*)]f ^ 1 

U * (Ax* - b)j - 100 • 


(7.8) 


As discnssed in Theorem 21, this can be done in time 0{y/d (nnz(A) + d^) log(t/)). Conseqnently, 
we can nse Lemma 26 with 7 = 3 and ||hi||^ = 3d and this gives ns an ellipsoid E snch that 


X* + -E C P C X* + 9dE 

O 


where P = {h G : ||^||v 2 p( 3 j,) < !}■ 

Now, we show how to approximate E nsing x. (7.8) shows that that ||x — ) — ib’ 

therefore, we have x — x* G ^E and hence 


X + 




E d X* + 


1 

3 


P C P C X* + 9dP C X + 



P. 


Now, nsing (7.8) and (7.7), we have 

1 


A{x—Xf) 

Axt—b 


< ^, we have 


V^p(x) P V^p(x*) = A'^Sf^WSf^A P 


(1 + ^)^ 




V^p(x). 


Therefore, we have 


1 1 


X + ^-l^P' C P C x+ ^ E' 


1 + 


10 


1 - A 

^ 10 


where P' = {/i G : 1111y< !}• After rescaling the ellipsoid, we have the resnlt. □ 


7.4 Multicommodity Flow 

Here we show how onr algorithm can be nsed to improve the miming time for solving mnlticom- 
modity flow. Note that the resnlt presented here is meant primarily to illnstrate onr approach, we 
believe it can be fnrther improved nsing the techniqnes in [10, 18]. 

For simplicity, we focns on the maximnm concnrrent flow problem. In this problem, we are 
given a graph G = {V,E) k sonrce sink pairs {si,ti) G , and capacities c G and wish 

to compnte the maximnm a G M snch that we can simnltaneonsly for all i G [k] ronte a nnit of 
flow fi G between Si and ti while maintaining the capacity constraint X)^=i — c(e) for all 
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e € E. There are no combinatorial algorithms known and there are multiple algorithms to compute 
a (1 — e) optimal flow in time polynomial in |£1|, \V\ and [6, 11, 8, 23]. In this regime, the 
fastest algorithm for directed graphs takes 0{{\E\ + k)\V\e~'^ logU) [23] and the fastest algorithm 
for undirected graphs takes k'^e~‘^ log*^*-^^ U) [13] where U is the maximum capacity. For 

linear convergence, the previous best algorithm is a specialized interior point method that takes 
time 0{^/\E\k\V\‘^k‘^ log{e~^)) [10]. 

To solve the concurrent multicommodity flow problem we use the following linear program 


max 

a 


ff(e) 

= 

'Y, fi{e) for all edges e. 


= 

0 for all vertices u ^ 

vev 

Y] fi{si,v) 

= 

a for all i, 

v&V 

c(e) > 

Me) 

> 0 for all edges e. 

c(e) > 

die) 

> 0 for all edges e. 


Note that there are 0{k\E\) variables, 0(A:|1/| + |i?|) equality constraints, and 0{k\E\) non-zeroes 
in the constraint matrix. Furthermore, it is easy to hnd an initial point by computing a short¬ 
est path for each Si and ti and sending a small amount of flow along that path. Also, given 
any almost feasible flow, one can make it feasible by scaling and send excess flow at every ver¬ 
tex back to Si along some spanning tree. Therefore, Theorem 21 gives an algorithm that takes 
d{^y\E\ + k\V\ {k\E\ + {\E\ + k\v\)^) logiU/e)) = O {{\E\ + k\V\)'^-^ logiU/e)) time. Note that 
this is faster than the previous best algorithm when k > (IFII/Il/I)*^'®. 

7.5 Minimum Cost Flow 

The minimum cost flow problem and the more specihc, maximum flow problem, are two of the most 
well studied problems in combinatorial optimization [31]. Many techniques have been developed for 
solving this problem. Yet, our result matches the fastest algorithm for solving these problems on 
dense graphs [18] without using any combinatorial structure of this problem, in particular, Laplacian 
system solvers. We emphasize that this result is not a running time improvement, rather just a 
demonstration of the power of our result and an interesting statement on efficient maximum flow 
algorithms. 

Corollary 28. There is an 0{\V\'^'^ (C)) time algorithm to compute an exact minimum cost 

maximum flow for weighted directed graphs with |F| vertices, \E\ edges and integer capacities and 
integer cost at most U. 

Proof. The proof is same as [18, ArXiv v2, Thm 34,35] except that we use Theorem 21 to solve the 
linear program. The proof essentially writes the minimum cost flow problem into a linear program 
with an explicit interior point and shows how to round an approximately optimal solution to a 
vertex of the polytope. To perform this rounding, we need to a fractional solution with error less 
than 0{ po|y([y|^) ) and which yields the log([/) term in the running time. □ 

7.6 Convex Problems 

Many problems in convex optimization can be efficiently reduced to the problem of hnding a point 
in a convex set K given a separation oracle. Recall that given a point x, the separation oracle 
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either outputs that x is in /C or outputs a separating hyperplane that separates the input point x 
and the convex set K. In [20], they showed that how to make use of our fast algorithms for inverse 
maintenance problem under the K stability assumption to obtain the following improved running 
time for this fundamental problem: 

Theorem 29 ([20]). Given a non-empty convex set K C that is contained in a box of radius R, 
i.e. maxjjgi^ Halloo — given a separation oracle for K that takes 0{'T) time for each 

call. For any 0 < e < R, we can either finds x £ K or proves that K does not contains a ball with 
radius e in time 0{dTlog{dR/e) -\-d^ log^^^\dR/e)). 

Remark 30. [20] uses Theorem 19 to solve the linear systems involved in their cutting plane method. 
However, their inverse maintenance problem satishes the stability assumption with 1 rows addition 
and removal. Furthermore, the A involved has only 0{d) many rows. Therefore, we believe a simple 
variant of [33] may also suffice for that paper. 

8 Open Problem: Sampling from a Polytope 

Sampling a random point in convex sets is a fundamental problem in convex geometry with numerous 
applications in optimization, counting, learning and rounding [35]. Here consider a typical case 
where the convex set is a polytope that is explicitly given as {x G : Ax > 6} for A G 
The current fastest algorithm for this setting is Hit-and-Run [22] and Dikin walk [9]. 

Given an initial random point in the set, Hit-and-Run takes 0*{d^) iterations and each iteration 
takes time 0*(nnz(A)) while Dikin walk takes 0*{nd) iterations and each iteration takes time 
O* where the O* notation omits the dependence on error parameters and logarithmic terms. 

Each iteration of Dikin walk is expensive because it solves a linear system to obtain the next point 
and computes determinants to implement an importance sampling scheme. The linear systems can 
be solved in amortized cost 0*(nnz(A) -|- d?) by the inverse maintenance machinery presented in 
this paper. Unfortunately it is not known how to use this machinery to efficiently compute the 
determinant to sufficient accuracy to suffice for this method. 

We leave it as an open problem whether it is possible to circumvent this issue and improve the 
running time of a method like the Dikin walk. In an older version of this paper, we mistakenly 
claimed an improved running time for Dikin walk by noting solely the improved running time for 
linear system solving and ignoring the determinant computation. We thank Hariharan Narayanan 
for pointing out this mistake. 
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A Relationships Between T-time Linear Solver and Inverse Matrix 


In this section, we prove relationships between linear T-time solvers and inverse matrices. In 
Lemma 31 we show that obtaining a spectral approximation to the inverse of a matrix suffices to 
obtain a linear solver. In Lemma 32, we show how a linear solver of a matrix yields a spectral 
approximation to the inverse of that matrix. 


Lemma 31. Given a PD matrix M and a symmetric matrix N ~o(i) ^ ^ applied to 

a vector in 0(fT) time we have a linear (nnz(M) -\-‘T)-time solver S o/M. 


Proof. Since N 


iQ(i) M ^ there is a constant L such that ^ ^ M ^ LN. 


algorithm S(5, e) T b where 




k=0 


mnM 

^) 


Consider the 


where | log(eL“'^)/log(l — L~‘^). This is the linear operator corresponding to performing z^ 

steps of preconditioned gradient descent to solve the linear system in M. Clearly, applying can 
be done in time 0((nnz(A) -I- T)ze) = 0((nnz(A) -|- T) log(e“^)). Therefore, all that remains is to 
show that S(5, e) is a solver for M. 

Note that we can rewrite Qe equivalently as 


Q. 


1 

L 


nV2^ 


/c=0 


) k 

nV2 


and hence it is symmetric. Furthermore, since clearly ^ ^ I we have that 


M-i - Q. = 


= 1 n 1/2 _ ^^ 1/2 ^ ^ 1/2 


L J J L 


= ^ p^i/2 _ 

fe=z,+l ^ ' 

Using the above two inequalities, we have that 


\s{b,e)-M-^ 


iri|2 
M 


< - 


k=z^+l ^ ^ 


M 


k=Ze+i 
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and since using that 0 ^ 1 — ^ (l — 7 ^) I, 

CO / 1 \ k 


we 


have 


0 ^ ^ < L‘^ (l- ^ 

k=Ze+l ^ / V . 


Consequently 

\\s{b,e)-M-m^<L^(l- 

'. that Ze was chosen 


2 Ze 


L2 


^ A - 


22:e 


L2, 


M-^b 


Thus, we see 

such that S{b, e) = Q, 


\ ^ / 2 

precisely to complete the proof. 

' 7 ,7 , \ b for e £ [0,0.1]. Then, we 

/,M-i 

have that for all b, 


□ 


Lemma 32. Let S be a linear solver of M 

QfMQ, and Q.MQJ’ M 

Proof. By the dehnition of S and standard inequalities, we 

- + (1 + ^) id''^ ‘"iiM 

^^)||M-l6||^<(l + 3^/i)||M-lh"' 


have 


IIL 


||s(6,e)| 

< {e + yfe + l + 

On the another hand, we have that for all b 


-in 1 2 


|M"^6 


M 


IIm - (1 + 


^1 + ||5(6, e) - M + (1 + -v/e) 115(6, e)| 


< 


2V6| 


M~^b 


|5(6,e) -M -^6 

M + (1 + \/^) 


2 

M 


2 

M’ 


Combining these two inequalities and using the dehnition of S, we see that for all 6 we have 

1 - 2^ \ grqT^Q^g^ {l + 3^/f)i^M-^b 

1 + ^/e J 

Using a Taylor expansion of e and the dehnition of ~ then yields Q^MQ^ ~ 4 yi M~^. In other 
words, all eigenvalues of (M^/ 2 q^]V[ 1 / 2 ^^ (M^/ 2 Q^]y[i/ 2 ^ lies between e~^^ and In general, 

for any square matrix B, the set of eigenvalues of equals to the set of eigenvalues of BB^ 

because of the SVD decomposition. Hence, all eigenvalues of (M^/^QeM^/ 2 ^ (M^/ 2 q^]V[ 1 / 2 ^^ ^IgQ 
lies in the same region. This proves M”^. □ 


B Remarks for Figure 1.1 

Figure 1.1 shows the previous fastest algorithm for linear programming min^^^gcr^x where A G 
running time described in the hgure comes from the following algorithms: 

1. O [y/nz + y/nd? + is achieved by the interior point method of Vaidya [33]. 

2. O {^\/nz + nd}'^^ + is achieved by using the Karmarkar acceleration scheme [12] on 

the short step path following method. See [28] for the details. 

3. 0{^/n{z + is achieved by using the currently best linear system solver [26, 21, 7, 3] on 

the short step path following method [30]. 

4. 0{d{z + d^'^®)) is achieved by the cutting plane method of Vaidya [34]. 
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